Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Toshiaki Koike-Akino

Perry

Quantum-Native Maximum Likelihood Detection in Random Access Channel with Overloaded MIMO

May 19, 2026

Hyoga Iizumi, Naoki Ishikawa, Shunsuke Uehashi, Kota Nakamura, Shusaku Umeda, Toshiaki Koike-Akino

Abstract:In this paper, we propose a quantum-native formulation of maximum likelihood detection (MLD) for overloaded multiple-input multiple-output (MIMO) systems in a random access channel, where numerous user terminals share the same channel resource and asynchronously transmit signals. Classical linear detectors suffer from significant performance degradation in this scenario, whereas the exhaustive-search MLD achieves the optimal performance but incurs an exponential computational complexity. To overcome this trade-off, we formulate the MLD as a binary optimization problem and solve it via Grover adaptive search (GAS) -- a quantum exhaustive search algorithm offering quadratic speedup in fault-tolerant quantum computing. We then introduce a search space reduction technique to substantially decrease the required computational resources. In addition, we investigate efficient parameter settings for GAS through probability analysis to improve convergence performance. We demonstrate that the proposed detector achieves the optimal detection performance while reducing the required Grover rotation count to reach the solution by up to approximately 65% compared with the conventional GAS, showing its potential as a viable solution for future quantum-accelerated wireless systems.

* 11 pages, 10 figures

Via

Access Paper or Ask Questions

Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment

May 13, 2026

Ye Wang, Jing Liu, Toshiaki Koike-Akino

Abstract:Inference-time alignment techniques offer a lightweight alternative or complement to costly reinforcement learning, while enabling continual adaptation as alignment objectives and reward targets evolve. Existing theoretical analyses justify these methods as approximations to sampling from distributions optimally tilted toward a given reward model. We extend these techniques by introducing reference-model temperature adjustment, which leads to further generalization of inference-time alignment to ensembles of generative reward models combined as a sharpened logarithmic opinion pool (SLOP). To mitigate reward hacking, we propose an algorithm for calibrating SLOP weight parameters and experimentally demonstrate that it improves robustness while preserving alignment performance.

Via

Access Paper or Ask Questions

Directional Embedding Smoothing for Robust Vision Language Models

Mar 16, 2026

Ye Wang, Jing Liu, Toshiaki Koike-Akino

Abstract:The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain vulnerable to jailbreaking attacks that undermine their safety alignment to yield harmful outputs. In this work, we extend the Randomized Embedding Smoothing and Token Aggregation (RESTA) defense to VLMs and evaluate its performance against the JailBreakV-28K benchmark of multi-modal jailbreaking attacks. We find that RESTA is effective in reducing attack success rate over this diverse corpus of attacks, in particular, when employing directional embedding noise, where the injected noise is aligned with the original token embedding vectors. Our results demonstrate that RESTA can contribute to securing VLMs within agentic systems, as a lightweight, inference-time defense layer of an overall security framework.

* Accepted at ICLR 2026 Workshop on Agents in the Wild

Via

Access Paper or Ask Questions

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

Mar 16, 2026

Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury, Jing Liu, Toshiaki Koike-Akino, Ming Jin, Ye Wang

Abstract:Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the model directly learns from test data without access to labels. However, this reliance on test data also makes TTT methods vulnerable to harmful prompt injections. In this paper, we investigate safety vulnerabilities of TTT methods, where we study a representative self-consistency-based test-time learning method: test-time reinforcement learning (TTRL), a recent TTT method that improves LLM reasoning by rewarding self-consistency using majority vote as a reward signal. We show that harmful prompt injection during TTRL amplifies the model's existing behaviors, i.e., safety amplification when the base model is relatively safe, and harmfulness amplification when it is vulnerable to the injected data. In both cases, there is a decline in reasoning ability, which we refer to as the reasoning tax. We also show that TTT methods such as TTRL can be exploited adversarially using specially designed "HarmInject" prompts to force the model to answer jailbreak and reasoning queries together, resulting in stronger harmfulness amplification. Overall, our results highlight that TTT methods that enhance LLM reasoning by promoting self-consistency can lead to amplification behaviors and reasoning degradation, highlighting the need for safer TTT methods.

Via

Access Paper or Ask Questions

Geo-ADAPT-VQE: Quantum Information Metric-Aware Circuit Optimization for Quantum Chemistry

Mar 11, 2026

Mohammad Aamir Sohail, Toshiaki Koike-Akino

Abstract:Adaptive ansatz construction has emerged as a powerful technique for reducing circuit depth and improving optimization efficiency in variational quantum eigensolvers. However, existing adaptive methods, including ADAPT-VQE, rely solely on first-order gradients and therefore ignore the underlying geometry of the quantum state space, limiting both convergence behavior and operator-selection efficiency. We introduce Geo-ADAPT-VQE, a geometry-aware adaptive VQE algorithm that selects operators from a pool using the natural gradient rule. The geometric operator-selection rule enables the ansatz to grow along directions aligned with the underlying quantum-state geometry, thereby improving convergence and reducing the algorithm's susceptibility to shallow local minima and saddle-point regions. We further provide an asymptotic convergence result. We present numerical simulations involving five molecules, which demonstrate that Geo-ADAPT-VQE achieves faster and more stable convergence compared to existing methods, while producing significantly shorter ansatz. In particular, Geo-ADAPT achieves up to 100-fold reduction in energy error compared to existing methods.

* 22 pages

Via

Access Paper or Ask Questions

AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent

Jun 11, 2025

Jing Liu, Toshiaki Koike-Akino, Ye Wang, Hassan Mansour, Matthew Brand

Abstract:To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices. In this work, we focus on layer-wise post-training quantization and pruning. Drawing connections between activation-aware weight pruning and sparse approximation problems, and motivated by the success of Iterative Hard Thresholding (IHT), we propose a unified method for Activation-aware Weight pruning and quantization via Projected gradient descent (AWP). Our experiments demonstrate that AWP outperforms state-of-the-art LLM pruning and quantization methods. Theoretical convergence guarantees of the proposed method for pruning are also provided.

* ICML 2025 workshop on Efficient Systems for Foundation Models

Via

Access Paper or Ask Questions

TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

May 27, 2025

Xiangyu Chen, Jing Liu, Ye Wang, Matthew Brand, Pu, Wang, Toshiaki Koike-Akino

Figure 1 for TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

Figure 2 for TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

Figure 3 for TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

Figure 4 for TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

Abstract:To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine-tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.

* Preliminary Work

Via

Access Paper or Ask Questions

$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

May 24, 2025

Toshiaki Koike-Akino, Jing Liu, Ye Wang

Figure 1 for $μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

Figure 2 for $μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

Figure 3 for $μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

Figure 4 for $μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

Abstract:To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these rely on calibration data, domain shift may arise for unknown downstream tasks. With a computationally efficient calibration, activation-aware pruning can be executed for every prompt adaptively, yet achieving reduced complexity at inference. We formulate it as a mixture of micro-experts, called $\mu$-MoE. Several experiments demonstrate that $\mu$-MoE can dynamically adapt to task/prompt-dependent structured sparsity on the fly.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

LatentLLM: Attention-Aware Joint Tensor Compression

May 23, 2025

Toshiaki Koike-Akino, Xiangyu Chen, Jing Liu, Ye Wang, Pu, Wang, Matthew Brand

Abstract:Modern foundation models such as large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources. We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure. Our method extends a local activation-aware tensor decomposition to a global attention-aware joint tensor de-composition. Our framework can significantly improve the model accuracy over the existing model compression methods when reducing the latent dimension to realize computationally/memory-efficient LLMs/LLMs. We show the benefit on several benchmark including multi-modal reasoning tasks.

* 37 pages, 16 figures

Via

Access Paper or Ask Questions

Range Image-Based Implicit Neural Compression for LiDAR Point Clouds

Apr 24, 2025

Akihiro Kuwabara, Sorachi Kato, Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe

Figure 1 for Range Image-Based Implicit Neural Compression for LiDAR Point Clouds

Figure 2 for Range Image-Based Implicit Neural Compression for LiDAR Point Clouds

Figure 3 for Range Image-Based Implicit Neural Compression for LiDAR Point Clouds

Figure 4 for Range Image-Based Implicit Neural Compression for LiDAR Point Clouds

Abstract:This paper presents a novel scheme to efficiently compress Light Detection and Ranging~(LiDAR) point clouds, enabling high-precision 3D scene archives, and such archives pave the way for a detailed understanding of the corresponding 3D scenes. We focus on 2D range images~(RIs) as a lightweight format for representing 3D LiDAR observations. Although conventional image compression techniques can be adapted to improve compression efficiency for RIs, their practical performance is expected to be limited due to differences in bit precision and the distinct pixel value distribution characteristics between natural images and RIs. We propose a novel implicit neural representation~(INR)--based RI compression method that effectively handles floating-point valued pixels. The proposed method divides RIs into depth and mask images and compresses them using patch-wise and pixel-wise INR architectures with model pruning and quantization, respectively. Experiments on the KITTI dataset show that the proposed method outperforms existing image, point cloud, RI, and INR-based compression methods in terms of 3D reconstruction and detection quality at low bitrates and decoding latency.

Via

Access Paper or Ask Questions